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(57) Abstract: To improve the performance of speech recognition 
under mobile circumstances, itiscuitomaryfOT speech 
he collected in order to be capable of making mnn» accurate rmxlftls 
of the speech. However, with some regularity the error correction 
is changed by the manufacturer, as a result of which the mismatch 
between tanning and reality increases. In addition, transmission 
errors are currently "taken care of* by including there in the tram- 
ing process, which increases the chance of "garbage in, garbage 
out". In order to overcome said drawbacks, the information avail- 
able downstream (1, 2) in the frames on the frame quality (BFI) 
and the presence of speech (SP), is used to dynamically control 
the upstream speech recogniser (20). The result is that, of frames 
presumed incorrect, only die correct part is used, and frames in 
which tio speech was transmitted, but in which there is silence, arc 
ignored by the speech recogniser. 
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BACKGROUND OF THE INVENTION 

The invention relates to a speech -processing system, 
comprising speech-recognition means for processing a signal 
(DATA) fed from a source to a speech input of said speech- 
processing system. 

It. is known that the quality of speech recognition at the 
receiving side of, e.g. , a GSM link [GSM = Global System for 
Mobile communications} is currently insufficient. if the 
recogniser is located within the network, the recognition result 
on the GSM^speech signal received and decoded is partly affected 
by the amount of artificially generated noise which is added, on 
the basis of the silence detected at the transmission side, and 
the noise and disturbances received resulting from the decoded 
transmission errors on the radio path. To improve recognition, 
it is customary to collect speech material that had been 
transmitted, by way of GSM, and to use said material to develop 
new speech models, which are trained to> speech signals containing 
(artificially generated) noise and distortions due to 
transmission errors, as a result of which the mismatch between 
the training situation and the recognition reality may be 
reduced. 

The known matter has the following drawbacks: due to the 
training on the received and decoded speech signals, the 
performance of the speech recogniser may only be improved 
marginally, since: 

1) decoding, e.g., encoded GSM signals is not standardised 
(only encoding is steradardised) , which signifies that in 
practice there arise situations in which the speech 
recogniser is trained on a GSM speech decoder other than 
the one applied at the input of the recogniser. The error 
correction applied in the decoder, e.g., is regularly 
changed since the manufacturer has found an improved way of 
processing transmission errors (which give rise to damaged 
speech) in such a manner that a large part of said errors 
is hidden (and therefore not or hardly noticeable to the 
human ear) . This results in a mismatch arising between the 
training set on which the speech models are based and the 
40 actual speech. 
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Ignored or adjusted, e.g., partially processed. Apart from the 
parameters referred to above — the BPI and the SID — use is 
also made of an encoding- mode parameter defining the significance 
of the speech- frame bits (FR Frame Relay], EFR Enhanced 
5 Pull Rate) , or the various modes under which AMR Adaptive 

Multi Rate] may operate). On the basis hereof, the recognition 
algorithm operative in the speech recogniser is adjusted to the 
characteristics with which the speech signal is encoded and 
decoded. 

10 

DESCRIPTION OF THE FIGORBS 

The operation of the invention is further explained with 
reference to several figures. As an example, we take the current 
part of the GSM system which makes use of an Enhanced Full Rate 
15 EFR) codec [?] . The same does not apply, however, to a Full 

Rate (« FR) codec, nor to the (future) Adaptive Multi Rate (» 
AMR) codec. FIG. 1 shows two terminals — a first, mobile 
terminal, such as a OSM handset, and a second, nonmobile 
terminal, such as a GSM base station — which are capable of 
communicating with one another by way of a wireless medium 9. in 
the figure, there is presented only upstream communication — 
from handset to base station. 

The handset shown in the top part of FIG. l comprises two 
modules or subsystems, namely a TX/DTX Handler 1 (DTX stands for 
Discontinuous Transmission) and a Tx Radio Subsystem 2. Module 1 
comprises a microphone 3, a speech encoder 4 and a Voice Activity 
Detector (- VAD) 5. Module 2 comprises a channel encoder 6, a 
Speech- flag monitor 7 and a transmitter 8. Signals received by 
the microphone 3 are fed to both the speech encoder 4 and to the 
30 VAD 5. 

In the VAD S, it is detected whether the microphone 3 is 
receiving speech or silence. This is encoded with a "SPeech 
flag" <=SP) , which is sent along with each speech frame. In the 
channel encoder 6, the microphone signal encoded in encoder 4 is 
encoded into frames capable of being transmitted by way of 
transmitter 8. To the frames, there is added redundant 
information, such as a check-sum code (CRC) on the basis whereof 
it may be calculated, at the receiving side, whether the frame 
has been transmitted correctly. In specific cases, an 
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decoder 16, to the speech recogniser 20 as well. upon receipt of 
said BP1, the speech recogniser 20 ignores the input offered, or 
attests as yet to recognise that part of the frame which indeed 
may be earmarked as being correct (although the BF1 has been 
set) . rn other words, the value of the BFI parameter operates as 
a control parameter for the speech recogniser, as a result of 
which it processes only correct frames in one go. 

Of frames earmarked as being broken, it Is attempted to use 
only that part which still continues to be correct, and frames 
earmarked as being wholly incorrect are ignored. That, in the 
event of a set BFI flag, part of the frame may still be correct, 
is caused by the bits in the speech frames being broken down into 
several classes (in GSM: 1A, IB and 2) . 

Not every class is "protected" in the same manner by adding 
15 redundant information. For. e.g., GSM, if bits of class 1A are 

characterised as being "damaged" (on the basis of the CRC) , the 
BFI flag is set (some manufacturers also set said flag in the 
event of damaged IB bits) . 

This need not signify, however, that all remaining bits are 
20 damaged as well. The recogniser takes, as its input, feature 

vectors (Rabiner & Juang, 1993) . Bach speech frame is converted 
into a feature vector. The values of the undamaged part of the 
speech frame may still be offered to the recogniser. This may be 
realised, e.g.. by giving the corrupted features in the feature 
vectors one specific value which results in a nil effect on the 
score of the signal received (De Veth, Cranen & Boves, 1998), or 
by ignoring the entire frame (Lippman & Carlson, 1997) . In 
approximately the same way, the sid parameter affects the speech 
recogniser 20- The SID parameter is derived from the value of 
the SPeech flag as given of f by the Voice Activity Detector S and 
transmitted by transmitter 8. In the event of speech, the sp 
receives a specific value, as well as the SID; should speech be 
lacking (silence), the SP and thereby the SID parameter will 
receive another value. The result is that the speech recogniser 
is enabled in the event of the transfer of a real speech signal 
and disabled in the event of the absence of speech. Finally, as 
indicated above, it is possible to set the operation of the 
speech recogniser 20 as a function of the encoding algorithm of 
the speech encoder 4 (e.g., FR, EFR, AMR etc.) . In the figure, 
such is done by the parameter CM determined by way of the 
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CLAIMS 

X. Speech-processing system, comprising speech- recognition 
means (20) for processing a signal entered from a source (i, 2} 
5 to a speech input (DATA) , characterised by means for affecting 

the operation of the speech-recognition means by one or more 
control parameters (CM, BID. BFI) entered by way of a control 
input, each control parameter relating to a specific 
characteristic of the signal entered from the source to the 
speech- recognition means (DATA) : 

2. Speech-processing system according to claim i, 
characterised in that a first control parameter (BFI) relates to 
the reliability or correctness of the signal entered and that the 
operation of the speech-recognition means (20) is adjusted to the 
reliability or correctness, as the case may be. indicated by said 
first control parameter, of the signal entered. 

3- Speech-processing system according to claim X. 
20 characterised in that a second control parameter (SID) relates to 

the speech/noise ratio and that the operation of the speecn- 
recognition means (20) is adjusted to the speech/noise ratio of 
the signal entered indicated by said second control parameter. 

25 4 " Speech-processing system according to claim l, the signal 

entered to the speech-recognition means (20) being encoded in 
speech-encoding means (4) at the source, characterised in that a 
third control parameter (CM) relates to the speech- encoding mode 
in the speech -encoding means, the operation of the speech- 
recognition means (20) being adjusted to the speech -encoding mode 
indicated by said third control parameter. 
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5. Telecommunications system, comprising a first terminal (i, 
2) having speech- and channel -encoding means U, 6) , a 
35 transmission medium (9) and a second terminal (11, 12) having 

channel- and speech- decoding means (13, 16) and a speech- 
processing system according to claim 1, said signal (DATA) being 
offered from the first terminal, by way of the transmission 
medium, to the speech input of the speech recogniser of the 
second terminal, and each control parameter (CM, SID, BFI) being 
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